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(54) Title: REAL-TIME ESTIMATION OF LONG RANGE DEPENDENT PARAMETERS 
(57) Abstract 

A method and apparatus of estimating long range dependent parameters, such 
as Hurst parameter H and size parameter cf, of a data stream in real-time. It includes 
inputting blocks of data of the data stream to a real-time discrete wavelet decomposition 
means (200) used to generate wavelet coefficients. A sum of squares of the coefficients 
at each scale is maintained together with the number of elements combined in the 
sum within processor means, such as a PC (304). When an estimation is required, 
averages of the squares of the coefficients are formed, followed by the calculation of 
a weighted linear regression using the averages leading to a detennination of estimates 
of the parameters. 
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REAL-TIME ESTIMATION OF LONG RANGE DEPENDENT 



PARAMETERS 



The present invention relates to a real-time method of estimating parameters used 
to characterise a Long Range Dependent (LRD) process such as is found in data 
5 streams. The present invention more particularly relates to a method of estimating 
the Hurst (H) parameter and size parameter Cf of a teleconmiunications data traffic 
sequence in real-time which assists in the analysis of data in teletraflfic 
appUcations. 

10 The phenomenon of LRD has recently attracted strong interest in 
telecommunications with the discovery of self-similar and long range dependent 
properties in data and conmiunications traffic of diverse types. The investigation 
of the impact of this on telecommunication network performance has highlighted 
the need for accurate and computationally effective estimation methods for LRD 

15 parameters. 

Long range dependence is known to be present in a wide variety of generalised 
data types including data traffic in high speed telecommunications networks. The 
presence of LRD allows one to predict trends in data traffic where there is a strong 
20 correlation between sets of the data. A common definition of LRD is the slow, 
power-law like decrease at large lag of the autocovariance function of a stationary 
stochastic time series {Xt}. Equivalently, it can be defined as the power-law 
divergence at the origin of the spectrum of that series: 



25 



fx(u)-Cf|ur,|u|->0 



(1) 



or 



Cf|u 




where H = (1 + a)/2 is the Hurst parameter, 0.5 < H < 1, a is the "scale" parameter 
and Cf is the "size" parameter. 
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Essentially the LRD phenomenon states that the sum of all correlations 
downstream from any given time instant is always appreciable, even if individually 
the correlations are small Crucially, it implies that there is no possibility of 
defining a characteristic time-scale beyond which correlations would have 
5 essentially disappeared, as would be the case for a process whose autocorrelation 
function decayed exponentially - the classical assumption. Thus, one cannot find a 
reference unit of time over which, for instance, some property of the data could be 
reliably measured. Instead of a single prominent time scale, LRD is characterised 
by scale invariance properties governed by the parameter a which describes the 
1 0 relationship between scales. 

The simplest definition of LRD involves the two parameters a and Cf mentioned 
hereinbefore, of which a is more important. 

15 As a, and therefore H, appear in the exponent of (1) it defmes the existence of the 
LRD phenomenon and govems the characteristic scaling behavior of a LRD 
process as well as statistics derived firom it including basic ones such as the sample 
mean of the series. Thus H gives a measure of the rate of decay of correlation 
between sets of data. It is therefore important that a is estimated well. 

20 

The parameter Cf plays a major role in fixing the absolute size of LRD generated 
effects, the general character of which is detemuned by H. Estimating Cf is an 
issue of importance for quantitative analysis, but is fraught with the same statistical 
difficulties intrinsic to H estimation, 

25 

There exist estimators of the Hurst parameter that are either time-domain based or 
firequency-domain based which have poor statistical perfomiance including high 
bias and/or high variance. 

30 However, estimators using wavelet analysis can provide a natural, statistically and 
computationally efficient estimation of the Hurst parameter H, in an unbiased 
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manner. Wavelet analysis is a tool which studies the scale dependent properties of 
data directly via the coefficients of a joint scale-time wavelet decomposition. The 
wavelet decomposition takes into account the scaling behavior of a process by 
examination over a multitude of scales. As such, very little needs to be assumed 
5 about the underlying process. Should evidence of LRD be found, it then offers an 
unbiased semi-parametric estimator which can be efficiently implemented using 
techniques from discrete multi-resolution analysis (MRA) (see P. Abry, P. 
Goncalves and P. Flandrin - "Wavelets, Spectrum Estimation, 1/f Processes", 
Wavelets and Statistics, Lectures Note in Statistics, Vol. 105 (1995), pp. 15-30). 

10 

However, such wavelet estimators only take into account the "static" analysis of a 
collection of data up to a particular time instant or over a time period. As such, it is 
an off-line process requiring a vast amount of memory to store each and every data 
set for the analysis thereof. If new sets of data are required to be added to the 
15 existing collection of data and thereafter analysed, all of the data sets are still 
required to be maintained and therefore more memory is required for the storage of 
the data sets. As further data sets are added, it becomes impractical to have vast 
arrays of memory storage devices. 

20 There is a need to be able to analyse real-time data, for example, teletraffic data so 
that the effect of additional data on a teleconmiunications network can be 
considered instantly without the requirement of storing vast amounts of data sets 
over a particular time interval. 

25 The present invention provides for a real-time method of estimatmg LRD 
parameters of on-line data streams based on wavelet analysis without the need for 
large memory storage and rapidly enough to handle very high data rates. 

The method scales naturally so that as data transfer rates become higher over time, 
30 the method will be implementable and effective in terms of speed and memory 
storage. 
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According to a first aspect of the invention there is provided a method of estimating 
long range dependent (LRD) parameters, such as Hurst parameter H and size parameter 
Cf ,of a data steam in real-time, said method comprising the steps of: 

5 inputting each block of data of said data stream to a real-time discrete wavelet 

decomposition means; 

extracting wavelet coefficients generated firom said wavelet decomposition 
means for each block of input data; 

maintaining a sum of squares of said wavelet coefficients and maintaining the 
10 number of elements combined in said sum at each one of a number of scales; 

at times when an estimation is required, the method further comprising the 
following steps: 

forming averages of said squares of said wavelet coefficients at said each one of 
a number of scales; 

15 performing a weighted linear regression using said averages over a range of 

scales; and 

detennining estimates for the LRD parameters on the basis of said Unear 
regression. 

Accordmg to a second aspect of the invention, there is provided a method of 
20 estimatmg LRD parameters, such as H and Cf , of a data stream in real-time, said 
method comprising the steps of: 

extracting packets of data firom said data stream; 

aggregating the packets over an n'*^ time interval to generate a data point x (n); 
inputting each data point x(n) to a real-time discrete wavelet decomposition means; 

25 extracting wavelet coefficients dj generated from said decomposition means for 

each data point x(n) ; 

maintaining a sum Sj of squares d^j of said wavelet coefficients and maintaining 
a number nj of data points used in said sum Sj at each scale j of a number of scales; 

at times when an estimation is required, the method fiirther comprising the 
30 following steps: 
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fonning averages of the squares d^j of said wavelet coefficients from said sum 
Sj at each scale j ; 

performing a linear regression using said averages over a range of scales; and 

determining estimates for the LRD parameters on the basis of said linear 
5 regression. 

The method may further include calculating confidence intervals for said estimates. 
Preferably, the wavelet coefficients are discarded after the step of squaring said 
wavelet coefficients. Preferably, prior to the performing step, a range of scales is 
10 chosen over which the linear regression is plotted. The step of choosing the range 
of scales may be performed manually on long time scales or automatically 
according to a suitable algorithm. 

The wavelet decomposition means may comprise a multi-resolution algorithm 
15 means or general filter bank means. The wavelet decomposition means may be 
implemented in software or hardware, for example, using a digital signal 
processing chip. 



The present invention also provides for apparatus for estimating LRD parameters, 
20 such as H and Cf , of a data stream in real-time, said apparatus comprising: 

means for receiving data packets of said data stream; 

pre-processor means for aggregating the received data packets over time 
intervals to generate respective data points x(n); 

a real-time discrete wavelet decomposition means for receiving each data point 

25 x(n); 

said decomposition means generating wavelet coefficients dj for each received 
data point x(n); 

means for calculating a sum Sj of squares d^j of said coefficients and deriving a 
number n; of data points used in said sum Sj at each scale j of a number of scales; 
30 memory means for storing Sj andiij; 
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wherein at times when an estimation of the LRD parameters is required, said 
means for calculating (a) forms averages of the squares d^j of said coefficients from 
said sum Sj at each scale j; 

(b) derives a linear regression using said averages over a range of scales, 

S and 

(c) determines estimates for the LRD parameters on the basis of said linear 
regression. 

A preferred embodiment of the invention will hereinafter be described with 
10 reference to the accompanying drawings wherein: 

Figure 1 is a block diagram showing the algorithm used to calculate the LRD 
parameters fix>m a data trafific stream; 

15 Figure 2 is a schematic diagram of a wavelet decomposition means in the form of a 
pyramidal filter-bank implementing a discrete wavelet transform to detennine the 
wavelet coefficients; 

Figure 3 is a block diagram of hardware used to implement the algorithm of 
20 Figure 1; 

Figure 4 is a flow diagram showing the steps involved in calculating LRD 
parameters using an off-line algoritimi; 

25 Figure 5 is a linear regression plot of yj versxis j, and 

Figure 6 is a log-log plot for Ethernet data. 

In Figure 1, there is shown a flow diagram 10 depicting a series of steps used for 
30 calculating the LRD parameters, H and Cf, in real-time starting firom the initial 
input of the data stream to be analysed. 
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At step 20, the process commences by interfacing with the physical media carrying 
ethemet traffic in order to extract the traffic packets. The traffic packets are then 
filtered at step 30 to extract relevant data. The process then progresses to step 40 
where the relevant data is aggregated over the n* time interval to generate the next 
5 data point x(n) of the series to be analysed. Each data point x(n) generated is input 
to a real-time filter bank at step 50 which is used to extract and update the Discrete 
Wavelet Transform (DWT) coefficients, to be described with reference to Figure 2. 
After the coefficients have been determined for each data point x(n), then at step 
60, each new data point is added to the existing number of data points for each 
10 scale j so that (nj+l) replaces nj as the new updated total of data points. The sum of 
squares Sj is also updated to include the new coefficients squared at that scale. 
Infomiation on the existing coefficient is discarded. Finally at step 70, at suitable 
time intervals, a calculation of time average ^ij for each j is made and the LRD 
parameters H and Cf are calculated to be discussed later. 

15 

A discrete wavelet transform (DWT) is perforaied on each data point x(n) wherein 
a number of wavelet coefficients are produced through a wavelet decomposition 
process using the real-time filter bank. 

20 The wavelet transform can be understood as a more flexible form of a Fourier 
Transforai, where the original signal is transformed, not into a firequency domain, 
but into a time-scale wavelet domain. The sinusoidal fiinctions of Fourier theory 
are replaced by wavelet basis fiinctions generated by simple translations and 
dilations of the the mother wavelet \|/o, itself defined via multiresolution theory (see 

25 I. Daubechies, "Ten Lectures on Wavelets", SIAM (1992)). The wavelet transform 
can be thought of as a method of simultaneously observing the signal at a fiill 
range of different scales or resolutions j. The wavelet coefficients of each data 
point x(n) essentially comprise details dj and approximations aj for each scale j. 

30 In Figure 2, there is shown a wavelet decomposition means 200 which may be a 
multi-resolution algorithm in the fomi of a real-time filter bank. Each data point 
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x(n) in the series to be analysed over an interval of time is input to the wavelet 
decomposition means 200 and fed through band-pass filter (BPF) 202 and 
downsampler 204. The wavelet coefficient, or detail, of the data point is output at 
the first scale as d^ Part of the input data point x(n) is fed to low-pass filter (LPF) 
5 206, through downsampler 208 and then is split again. One portion is input to BPF 
210 and downsampler 212 to extract the coefficient di at scale 2. Another portion 
is input to LPF 214 and downsampler 216. The process is repeated to extract the 
coefficients up to scale j. Therefore, output di is updated for every second new 
data point x(n), output d2 is updated for every fourth new x(n) and output dj is 
1 0 updated for every (2^)* new x(n). 

Each BPF and LPF in the wavelet decomposition means 200 is of a finite length K, 
so that only K input values are held in the memory of each BPF or LPF. Once a 
value, for example x(n) itself which is input to BPF 202, has propagated through 
15 the filter, the value is dropped or discarded. Storage requirements are therefore 
fixed for each filter with K. log2(n) values being stored in the filter bank overall. 
Each of the filters may be implemented by a Finite Impulse Response (FIR) filter. 
The wavelet decomposition means may be implemented in software or in 
hardware, for example, on a DSP chip. 

20 

Referring to Figure 1, after the DWT coefficients dj are updated, using x(n), in the 
real-time filter bank in step SO, a number of actions occur in step 60. For each x(n) 
that is updated at each scale j: 

the number of wavelet coefficients at that scale is mcremented by 
25 one, that is, is replaced by (nj+ 1); 

the existing sum Sj of squared coeflBcients is replaced by (Sj + dj^) to 
produce an updated sum of squares. The coefficient value dj is then discarded as it 
is no longer required. 

30 Therefore, all that is required to be stored in memory is the updated sum Sj and the 
number of data points Uj. Note that it is only when a dj is updated in step 50, that it 
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is forwarded onto the next step 60 for the calculation of the new nj and Sj and then 
dj is discarded thereafter. The new values nj, Sj are stored in memory residing in 
the RAM. 

5 In step 70, at various time intervals suitable to the user, a time average ^ij is 
calculated for each j, where 

«A J) nj 

using the stored Sj and nj for each scale. Then, based on the values for (nj) and (^ij), 
10 an estimation of the LRD parameters H and Cf is performed using a wavelet-based 
joint estimator or ofif-line algorithm, to be described hereinafter. 

Figure 3 shows the hardware, in block forai, used to implement the algorithm of 
Figure 1. A Network Interface Card (NIC) 302 is used to interface with, and 

15 capture traffic packets from the Ethernet 300. The Ettiemet may be 10 Base T or 
10 Base 2. It is then presented to PC 304 which comprises an Intel Pentium PC 
running a version of tiie UNIX(R) operating system named Free BSD. The PC 304 
runs the traffic analysis software, written in C and performs the necessary 
calculations for obtaining a sum Sj of squares d^j, deriving the Uj of data points used 

20 in the sum Sj. Such software may also be used to form averages of d^j at each scale 
j, subsequently derive a linear regression, determine an estimate for H and Cf and 
calculate any updates for Sj and n, on arrival of new data points x(n) Alternatively, 
hardware such as DSP chips may be used to calculate a sum Sj of squares d^j, 
derive Uj and form averages jij of d^j at each scale j.. The PC 304 includes a 

25 software preprocessor 306, a DWT and Estimator Unit 308 and memory unit 310. 
The preprocessor 306 preprocesses the traffic measurements through the use of the 
Berkeley Packet Filters, and outputs a time series of the number of bytes per time 
interval. The preprocessed traffic measurements are passed to unit 308 which 
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updates the on-going wavelet decomposition, summary statistics and Hurst 
parameter estimation, and outputs the results periodically to a printer, plotter or 
other device. The memory unit 310 is in the foraa of RAM and is used to store the 
preprocessed traffic measurements, wavelet coefficients derived firom the 
5 measurements and summary statistics. Note that only the N most recent 
measurements and coefficients at each scale are required, where N is the length of 
the FIR filters used to implement the LPFs and BPFs used in the wavelet 
decomposition. The remainder of the data has been condensed and stored in the 
summary statistics. 

10 

In Figure 4, there is shown a flow diagram 400 of the steps involved in estimating 
LRD parameters H and Cf using the wavelet-based joint estimator described in "A 
wavelet-based Joint Estimator of the Parameters of Long-Range Dependence, 
Technical Report SERC-0043, 1997 by D. Veitch and P. Abry". The steps 410 to 
15 470 are self-explanatory and what follows is a detailed description of the steps. 

The Wftvelgt Estimator 

In the analysis of the LRD phenomenon, the following two features, (F1,F2), of the 
family of wavelet basis functions play key roles: 

20 

Fl: The family of wavelet basis functions generated firom the mother wavelet 
v|/o are constructed firom the dilation or change of scale operator: 

¥j,o(0-l''¥o(rt) (3) 

This means that the analyzing family exhibits, by construction, a scale invariance 
feature. The LRD phenomenon can be understood as the absence of any 
25 characteristic frequency (and therefore scale) in the range of frequencies close to 
the origin. The LRD property can thus be interpreted as a scale invariance 
characteristic which is efficiently analysed by wavelets. 
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F2: vj/o has a number N of zero or vanishing moments which can be freely 
chosen provided N ^ 1. By definition this means that |t*^ ij/q (t) dt s 0, k = 
0,...., N - 1 (but not for k > N), or equivalently that the Fourier Transform of 
V|/o satisfies |To(u)| = O (|o|^) at the origin. This property can be used to 
5 control divergences arising with processes having power-law spectra at the 

origin. 

For a process with a power-law spectrum such as a LRD process, these features 
engender the following key properties of the wavelet coefficients dj over a range of 
10 scales 2*, j = ji-.-ja where the power-law scaling holds. 

PI: Due to Fl, the scale invariance (the power-law behavior) is captured 
exactly: 



15 IEd/ = 2^'^CfC (4) 

where 

C= J |on%(o)pdu. (5) 



This exact recovery of a power-law is not a trivial effect and results directly from 
20 the dilation operator underlying the design of the wavelet basis. Time-frequency 
or periodogram based estimates would not exhibit such a feature. 

P2: Due to Fl and F2, the dj are a collection of random variables which are 
quasi-decorrelated (see "Wavelet Analysis and Synthesis of Fractional 
25 Brownian Motion" by P. Flandrin, IEEE Trans, on Info. Theory IT-38 

(1992) pp. 910-917). In particular, the long-range dependence present in 
the time domain representation is completely absent in the wavelet 
coefficient plane {j,k}. 
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Elaborating on Property P2 it has been shown that correlations in the time-scale 
plane decay at least hyperbolically in all directions with exponents controlled by 
the number of vanishing moments and corresponding to short range dependence. 
Since by definition the octave j = logi (scale), this implies exponential decay in 
5 octave j. 

From hereon log2 will denote the logarithm base 2, whereas In will denote natural 
logarithms. 

10 The intuitive basis of the estimator can be found by analysing equation (4). 
Rewriting it as log2 (lEdxO,.)^) = ja + log2(CfC) strongly suggests a linear 
regression approach for estimatmg (a,cc), where clearly the slope of the regression 
would estimate a and the intercept would be related to Cf. The idea of using a log- 
log plot is common to many contexts when an exponent is the object of interest. 

15 The real issue is to what extent the promise of this simple linear form is realised in 
the resulting estimator, once the inevitable complications are taken into account. 

The first, essential, complication is of comrse that lEdj', a second order quantity that 
can be related to the spectrum of x, is not known but must be estimated. In the 
20 present context this is the principal difficulty as it is well known (see "Statistics for 
Long-Memory Processes", J. Beran, Chapman & Hall (1994)) that the estimation 
of second ord^ (and other) quantities in a long range dependent context is a 
delicate task. Here, however, property P2, the quasi-decorrelation of the dj allows 
us to effectively use the simple "time average" 

Mr-^d] (6) 
nj 

25 

where nj is the number of coefficients at octave j available to be analysed. This 
quantity is an unbiased and consistent estimator of lEdj^. (Note that is the 
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sample variance of dj, since from F2 the expectation of dj is identically zero for 
eachj). 

Thus the sample data dj for nj samples at scale j of the wavelet decomposition is 
5 squared and then summed and divided by nj to form the "time average". 

The second compUcation is the non-linearity introduced by the loga, which biases 
the estimation. We will see below how this problem also can be circumvented 
under reasonable hypotheses. Simplifying things slightly, we confirm that the 
10 fundamental approach underlying our estimator is indeed a Iraear regression of 
logzdJj) on log2(2^) = j. A weighted linear regression will be used as the variances 
of the log2(M3) with j. 
Lingar regression 

15 The fundamental hypothesis of linear regression is lEyj = bxj + a. We define the 
quantities S = Z l/o\ Sx = 2 xj/oj^ and Sxx ~ SxjVoj^ where a^j is an arbitrary 
weight associated with yj. The usual xmbiased estimator (b,a) of (b, a) is 



0- = Lwjyj. (o) 



20 



a= sj^vjyj. (9) 

SSm'St 



where the weights wj and Uj satisfy Zwj = Zjuj = 0, Zjwj = Zuj = 1. Note that these 
conditions imply that there are always both positive and negative Oj and Wj. 



25 



If in addition the yj are mutually independent then the covariance matrix is given 
by 
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ss„-s: 



2 • 



(11) 



SSxx" Sx 



r=^'Sx/^|sSxx. (13) 

5 

where r is the correlation coefficient If xj > 0 for each j it is easy to see that r will 
be negative, and large in magnitude if Xi is large, as a small change in tihe slope "to 
the right" will result in an amplified change of opposite sign in the intercept. 

10 Finally, if we set cj = Var(yj), then (b, a) is the minimum variance unbiased 
estimator (MVUE) (see "Fundamentals of Statistical Signal Processing" by 
S.M. Kay, Prentice-Hall (1993)) with covariance matrix as above. 

Note that in the event of small errors in the values of the cj^ and small correlations 
15 between the yj, the estimator remains unbiased and its covariance matrix can be 
accurately estimated by the expressions just given. 

Thus far we have indicated tiiat log2 (fij) is the variable yj of the desired linear 
regression satisfying EEyj =^ bj + a. Since IE log2 (Pj) ^ log2(IEpj) = ja + logi (CfC) 
20 in general, this cannot be exactly true, although under the conditions H1-H3 below, 
and also assuming nj large, it can be established that 
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N 



log, C, 



nln^ 2) 



where signifies equality in distribution and N(|i,a*^) is a Gaussian random 
variable. In a LrD context however, the large scales are usually the most 
important to consider, and it is precisely there that the nj are not large. Here the 
conditionj is removed by examining the distribution of log2(}ij) in more detail. It 
turns out that this refinement leads to only a small improvement in the estimation 
of a, but has veiy important implications for the estimation of Cf. 



10 Throughout the analysis it is instructive to bear in mind that the number of 
available detail coefficients nj essentially decreases by half as the scale is doubled, 
that is Uj+i K nj/2, and therefore that nj+i » n2"^ where n is the length of the initial 
data. 



15 We assume that the following supplementary hypotheses hold true. 
HI : The process x, and hence the processes d(j,.), are Gaussian. 
H2: For fixed j the process d(j,.) is iid. 
H3: The processes d(j,0 and d(j V), j j', are independent. 



20 Hypothesis HI is justified by numerical evidence which shows that the method is 
very insensitive to the form of the marginal distributions of x. Hypotiieses H2 and 
H3 are both well justified by property P2 (they are separated to make it clearer 
which properties are needed where). 

25 These extra conditions, whilst appearing very restrictive at first glance, are in fact 
very reasonable in practical terms, as borne out in simulations. The reason for this 
is that the underlying effectiveness of the method is based on PI and P2, H1-H3 
being added only to extend the quantitative analysis. 



30 



Let the density of a Chi-squared variate Xu ^ X^^ be denoted by 
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1 



The mean and variance of such a variate are u 



.2^'^r(v/2)j 

and 2u respectively. Also set Zj = 2^° CfC. From HI and H2 and equations 4 and 6 
we have 



(14) 



where lEuj = ^ as p, is unbiased, and therefore 



- ja + log2 CfC - logj n. + lnX„^ /ln2. 



(15) 



Thus the study of log2(|Jj) reduces to that of the logarithm, of a Chi-squared 
variable. 

Using the relations e""^ dx = — r (u)(t(«) - Inp], Reji > 

OJReo >0 

(see Table of Integrals, Series and Products", I.S. Gradshteyn and IM. Ryzhik, 
Academic Press, corrected and enlarged edition (1980)), equation j 4.352), and 

i^x"'e''(Inx/dx = -^r(v)[(Yf(v).)iifi/-i-(^(2.v)J. Re^t > Reu > 0, (above 
M 

reference equation 2, ^ 4.358) where \|/(z) = r*(2) fTiz) is the Psi function and C, (2,u) is 

a generalised Riemann Zeta fiincdon, it is straightforward to show from the definition of 
f;,(x) above that 
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where the teim 



17 



IE\aXu'¥(ty2)+ln2, (16) 



Var(\axJ = i(2.v/2). (17) 



IE\og,(^j)=ja + log,cfC + gj. (18) 



Var(log,(Mj))-C(2.nj/2)/]n'2. (19) 



gj = w(nj/2)/ln2 - log/«y /2). (20) 



10 a negative function of nj only, can be easily calculated for all values of nj. 

The term is a small corrective term which is substracted to account for the 
distorting non-linearity introduced by the log. 

15 For future reference, below are the asymptotic form for n, large of the quantities 
above: 



-1 



n.ln2 



(21) 



Vcir(log,(Mj)) ^ 

nyln 



(22) 
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Defining the variable yj as yj s log2 (pj) - gj (23) 

we see that from the above description it is clear that under HI and H2 they obey 

IEyj^ja^\og,CfQ (24) 

5 

Var(yj)^i(2.nj/2)/\rC2. (25) 

and thus satisfy the requirements for a weighted linear regression. 

A weighted regression estimation {&, a) of yj on j = Xj is perfomied according to 
10 equations 8 and 9 with oj^ = Var(yj). 

An example of the regression fit using a simulated data set is given in Figure 5 
where yj = logiCMj^gj is plotted against j and showing 95% confidence intervals. 
The 95% confidence intervals for each yj, shown as vertical lines at each octave j, 

15 are seen to increase with j. This can be understood fix)m equation (22), 
remembering that nj « n2'^ meaning that the number of data points halves with 
every increase in j by one. The mtervals are derived fi'om the known sample 
variances a/ of the estimates yj imder gaussian assumptions. An LRD process is 
apparent between scales 4 and 10 whereas strong SRD or short range dependence 

20 is apparent for scales less tiian 4, and particularly for the range j=l to j=3. The 
vertical bars at each octave give 95% confidence intervals for the yj. The series is 
simulated farima (0,d,2) with d=0.25 (a=0.50) and ^ = [-2, -1] implying Cf = 6.38. 
Selecting O u ji) = (4,10) identifies the relevant scaling range allowing an accurate 
estimation despite the strong SRD: a = 0.53 ± 0.7, Cf= 6.0 with 4.5 < Cf < 7.8. 

25 

In Figure 6, there is shown a log-log plot for real Ethernet data. The data is firom 
an Ethernet trace and contains in excess of 30 million observations which took 
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only a few seconds to analyse. It plots log2(Mj) versus j. It shoes an example of the 
wavelet based scale analysis for contmuous time Ethernet data. The asymptotic 
LRD behaviour is seen to enter at octave j=14. The estimate of the self-similarity 
parameter H=l - J3\ 2 is H=0.8. 
5 By using the wavelet decomposition means, the static LRD estimation method 
provides significant advantages in terms of memory storage, memory usage and 
calculation times. Using it, the input data can be split into blocks, analysed and 
recombined, so that memory problems are not encountered in treating data of 
arbitrary length n. The run time complexity of the method is very low, of the order 

10 h or 0(n), making it very suitable for the analysis of very large data sets. These 
advantages make the method suitable for real-time implementation where the data 
is collected and the LRD parameters estimated on a continuous, on-the-fly basis. 
In the future, when the total amount of data to be processed rises from megabits to 
gigabits, and even terabits, storage problems will not arise as the requirements of 

1 S the real-time algorithm vary as the logarithm of the data length. 

The real-time version described here and the working preferred implementation 
prove that these potential real-time advantages can be achieved in practice. 



20 
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CLAIMS 

1. A method of estimating long range dependent (LRD) parameters, such as Hurst 
parameter H and size parameter Cf ,of a data steam in real-time, said method comprising 
the steps of: 

5 inputting each block of data of said data stream to a real-time discrete wavelet 

decomposition means; 

extracting wavelet coefficients generated from said wavelet decomposition 
means for each block of input data; 

maintaining a sum of squares of said wavelet coefficients and maintaining the 
1 0 number of elements combined in said sum at each one of a number of scales; 

at times when an estimation is requu-ed, the method further comprising the 
following steps: 

forming averages of said squares of said wavelet coefficients at said each one of 
a number of scales; 

15 performing a weighted linear regression using said averages over a range of 

scales; and 

determining estimates for the LRD parameters on the basis of said linear 
regression. 

2. A method according to claim 1 wherein, for each updated scale, the existing 
20 sum of squares of said wavelet coefficients is replaced by a new sum of squares 

equivalent to the existing sum added to the square of each new wavelet coefficient. 

3. A method according to claim 1 or claim 2 wherein, for each updated scale, the 
existing number of elements combined in said existing sum is incremented by the 
number of new elements at each scale. 

25 4. A method according to claim 2 or claim 3 wherein the value of each coefficient 
is not retained or stored after each update. 

5. A method according to any one of the previous claims wherein each block of 
data input to the decomposition means is not retained or stored. 

6, A method according to any one of the previous claims wherein the step of 
30 forming averages includes fonning a time average at each scale equivalent to the 
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updated siun of squared coefficients divided by the updated number of elements 
combined in the updated sum. 

7. A method according to any one of the previous claims wherein the linear 
regression is plotted and prior to the perfonning step, a range of scales is chosen. 

5 8. A method according to any one of the previous claims further comprising the 
step of calculating confidence intervals for each of the derived estimates. 

9. A method of estimating LRD parameters, such as H and Cf , of a data stream in 
. real-time, said method comprising the steps of: 

extracting packets of data from said data stream; 

10 aggregating the packets over an n* time interval to generate a data point x (n); 

inputting each data point x(n) to a real-time discrete wavelet decomposition means; 

extracting wavelet coefficients dj generated from said decomposition means for 
each data point x(n); 

maintaining a sum Sj of squares d^j of said wavelet coefficients and maintaining 
1 S a number nj of data points used in said sum Sj at each scale j of a number of scales; 

at times when an estimation is required, the method further comprising the 
following steps: 

forming averages of the squares d^j of said wavelet coefficients from said siun 
Sj at each scale j; 

20 perfonning a linear regression using said averages over a range of scales; and 

determining estimates for the LRD parameters on the basis of said linear 
regression. 

10. A method according to claim 9 wherein, for each updated scale j , the sum Sj is 
replaced by a new sum (Sj+d^j ) to yield an updated sum of squares. 

25 11. A method according to claim 9 or claim 10 wherein, for each x(n) updated at 
scale j, the number of data points nj , or equivalently the nimiber of coefficients at scale 
j, is incremented by one such that nj is replaced by (nj+1). 

12. A method according to any one of claims 9 to 1 1 wherein the value of each dj is 
not retained or stored after each update. 
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13. A method according to any one of claims 9 to 12 wherein each data point x (n) 
input to the decomposition means is not retained or stored. 

14. A method according to any one of claims 9 to 13 wherein the step of forming 
averages comprises forming a time average \i j at each scale j such that 

5 

ny V J J nj 
using the maintained Sj and nj for each scale. 

15. A method according to claim 14 further comprising the step of calculating a 
10 random variable yj where 

yj = log2(tij)-gj 

where gj is a small corrective term. 
15 16. A method according to claim 1 5 further comprising the step of plotting yj 
against j with confidence intervals about j based on cxj where cr is an arbitrary 
weight associated with j . 

1 7. A method according to claim 1 6 wherein after the plotting step a scaling range 
is chosen on which to base the linear regression. 
20 18. A method according to claim 1 7 wherein following selection of the scaling 
range, a weighted linear regression of yj on j is performed in the selected scaling range 
with weight crj^ . 

19 A method according to claim 18 wherein an estimate of H (or a where a= 
2H-1) is obtained from the slope of the regression and an estimate of Cf is obtained 
25 from the intercept of the regression. 

20. A method according to claim 1 9 further comprising the step of calculating 
confidence intervals of the estimate of LRD parameters H and Cf . 

2 1 . Apparatus for estimating LRD parameters, such as H and cr , of a data stream in 
real-time, said apparatus comprising: 
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means for receiving data packets of said data stream; 
pre-processor means for aggregating the received data packets over time 
intervals to generate respective data points x(n); 

a real-time discrete wavelet decomposition means for receiving each data point 

5 x(n); 

said decomposition means generating wavelet coefficients dj for each received 
data point x(n); 

means for calculating a sum Sj of squares d^j of said coefficients and deriving a 
number nj of data points used in said sum Sj at each scale] of a number of scales; 
10 memory means for storing Sj and nj; 

wherein at times when an estimation of the LRD parameters is required, said 
means for calculating (a) forms averages of the squares d^j of said coefficients firom 
said sum Sj at each scale j; 

(b) derives a linear regression using said averages over a range of scales, 

15 and 

(c) determines estimates for the LRD parameters on the basis of said linear 
regression. 

22. Apparatus according to claim 21 wherein, for each updated j, the sum Sj , is 
replaced by a new sum (Sj + dj^), to yield an updated sum of squares, which new sum 

20 is subsequently stored in said memory means. 

23. Apparatus according to claim 21 or claim 22 wherein, for each x(n) updated at 
scale j, the number of data points Uj , or equivalently the number of coefficients at scale 
j, is incremented by one , such that nj is replaced by (nj +1) and the updated value (nj+ 
1 ) is subsequently stored in said memory means. 

25 24. Apparatus according to any one of claims 21 to 23 wherein said means for 
calculating is implemented in software. 

25. Apparatus according to any one of claims 21 to 23 wherein said means for 
calculating calculates the sum Sj, derives Uj and forms said averages using hardware 
and derives said linear regression and said estimates using software. 
30 26. Apparatus according to any one of claims 21 to 25 wherein the value of each dj 
is not retained or stored in said memory means. 

27. Apparatus according to any one of claims 21 to 26 wherein each data point x(n) 
received by said decomposition means is not subsequently retained or stored by said 
memory means. 
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28. Apparatus according to any one of claims 21 to 27 wherein said pre-processing 
means, said decomposition means and said memory means are part of a computing 
processor such as a PC. 

29. Apparatus according to any one of claims 21 to 28 wherein said decomposition 
5 means comprises a multi-resolution algorithm means. 

30. Apparatus according to claim 29 wherein said multi-resolution algorithm means 
is a real-time filter bank comprising a series of low-pass filters (LPFs) and band-pass 
filters (BPFs). 

3 1 . Apparatus according to claim 30 wherein said LPFs and said BPFs are finite 
10 impulse response filters. 

32. A method or system substantially as hereinbefore described with reference to 
the accompanying drawings. 
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